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Abstract 

In  a  previous  publication  I  showed  that  counterfactuals  grounded  in  decision  making 
can  be  given  interpretation  in  terms  “imaging”  -  a  process  of  “mass-shifting”  among 
possible  worlds.  This  paper  elaborates  on  this  interpretation  and  shows  the  converse: 
imaging  can  be  given  an  interpretation  in  terms  of  a  stochastic  decision  policy  in  which 
agents  choose  actions  with  certain  probabilities.  This  mapping,  from  the  metaphys¬ 
ical  to  the  physical,  should  be  helpful  in  assessing  whether  metaphysically-inspired 
extensions  of  current  interventional  theories  are  warranted  in  a  given  decision  making 
situation. 


1  Introduction  -  Physical  and  Metaphysical  Concep¬ 
tions  of  Actions 

The  traditional  accounts  of  causal  decision  theory  (CDT),  most  notably  those  developed 
by  Stalnaker  (1972);  Lewis  (1973);  Gardenfors  (1988)  and  Joyce  (1999),  offer  what  we 
might  call  a  metaphysical  view  of  counterfactuals,  where  “possible  worlds,”  “similarity” 
and  “weight  shifting”  are  the  basic  concepts.  In  contrast,  the  structural  account  of 
counterfactuals  (Pearl,  2000)  takes  the  physical  notions  of  “mechanisms”,  “variables”, 
“measurements”  and  “interventions”  as  the  basic  primitives.  This  paper  deals  with  the 
relationships  between  the  two  accounts. 

If  the  options  available  to  an  agent  are  specified  in  terms  of  their  immediate  consequences 
(as  in  “make  him  laugh,”  “paint  the  wall  red,”  “raise  taxes”  or,  in  general,  do( X  =  x ),  then 
a  rational  agent  is  instructed  to  maximize  the  expected  utility 

EU(x)  =  y  P,(y)U(y)  (1) 

y 

over  all  options  x.  Here,  U(y )  stands  for  the  utility  of  outcome  Y  =  y  and  Px(y)  -  the  focus 
of  this  paper  -  stands  for  the  (subjective)  probability  that  outcome  Y  =  y  would  prevail, 
had  action  do(X  =  x)  been  performed  and  condition  X  =  x  firmly  established. 
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It  has  long  been  recognized  that  Bayesian  conditionalization,  i.e. ,  Px(y)  =  P(y\x),  is 
inappropriate  for  serving  in  Eq.  (1),  for  it  leads  to  paradoxical  results  of  several  kinds  (see 
(Skyrms,  1980;  Pearl,  2000,  pp.  108-9)).  For  example,  patients  would  avoid  going  to  the 
doctor  to  reduce  the  probability  that  one  is  seriously  ill;  barometers  would  be  manipulated 
to  reduce  the  likelihood  of  storms;  doctors  would  recommend  a  drug  to  male  and  female 
patients,  but  not  to  patients  with  undisclosed  gender,  and  so  on.  Yet  the  question  of  what 
function  should  substitute  for  Px(y),  despite  decades  of  thoughtful  debates  (Jeffrey,  1965; 
Harper  et  ah,  1981;  Cartwright,  1983)  seems  to  still  baffle  philosophers  in  the  21st  century 
(Weirich,  2008;  Arlo-Costa,  2007). 

Guided  by  ideas  from  structural  econometrics  (Haavelmo,  1943;  Strotz  and  Wold, 
I960),1  I  have  explored  and  axiomatized  a  conditioning  operator  called  do(x)  (Pearl, 
1995)  that  captures  the  intent  of  Px(y)  by  simulating  an  intervention  in  a  causal  model  of 
interdependent  variables  (Pearl,  2009). 

The  idea  is  simple.  To  model  an  action  do( X  =  x)  one  performs  a  “mini-surgery”  on  the 
causal  model,  that  is,  a  minimal  change  necessary  for  establishing  the  antecedent  X  =  x , 
while  leaving  the  rest  of  the  model  intact.  This  calls  for  removing  the  mechanism  (i.e., 
equation)  that  nominally  assigns  values  to  variable  X,  and  replacing  it  with  a  new  equation, 
X  =  x,  that  enforces  the  intent  of  the  specified  action.  One  important  feature  of  this 
formulation  is  that  P(y\do(x))  can  be  derived  from  pre-interventional  probabilities  provided 
one  possesses  a  diagrammatic  representation  of  the  processes  that  govern  variables  in  the 
domain  (Pearl,  2000;  Spirtes  et  ah,  2001).  Specifically  the  post-intervention  probabilities 
reads:2 


P(x,y,  z\do(X  =  x*)) 


P(x,y,  z)/ P(x\z)  if  x  =  x* 
0  if  x  ^  x* 


(2) 


Here  z  stands  for  any  realization  of  the  set  Z  of  “past”  variables,  y  is  any  realization  of  the 
set  Y  of  “future”  variables,  and  “past”  and  “future”  refer  to  the  occurrence  of  the  action 
event  X  =  x*. 3 

The  philosophical  literature  spawned  a  totally  different  perspective  on  the  probability 
function  Px(y)  in  (1).  In  a  famous  letter  to  David  Lewis,  Robert  Stalnaker  (1972) 
suggested  to  replace  conditional  probabilities  with  probabilities  of  conditionals,  i.e., 

Px(y)  =  P(x  >  y)),  where  (x  >  y)  stands  for  counterfactual  conditional  “Y  would  be  y 
if  X  were  x."  Using  a  “closest  worlds”  semantics,  Lewis  (1973)  defined  P(x  >  y)  using  a 
probability-revision  operation  called  “imaging,”  in  which  probability  mass  “shifts”  from 
worlds  to  worlds,  governed  by  a  measure  of  “similarity”.  Whereas  Bayes  conditioning 
P(y\x)  transfers  the  entire  probability  mass  from  worlds  excluded  by  X  =  x  to  all  remaining 


1  These  were  brought  to  my  attention  by  Peter  Spirtes  in  1991. 

2  The  relation  between  Px  and  P  takes  a  variety  of  equivalent  forms,  including  the  back-door  formula, 
truncated  factorization,  adjustment  for  direct  causes,  or  the  inverse  probability  weighing  shown  in  (2)  (Pearl, 
2000,  pp.  72-3).  I  chose  the  latter  form,  because  it  is  the  easiest  to  motivate  without  appealing  to  graphical 
notation. 

3I  will  use  “future”  and  “past”  figuratively;  “affected”  and  “unaffected”  (by  X)  are  more  accurate  tech¬ 
nically  (i.e.,  descendants  and  nondescendants  of  X,  in  graphical  terminology).  The  derivation  of  (2)  requires 
that  processes  be  organized  recursively  (avoiding  feedback  loops);  more  intricate  formulas  apply  to  non¬ 
recursive  models.  See  Pearl  (2009,  pp.  72-3)  or  Spirtes  et  al.  (2001)  for  a  simple  derivation  of  this  and 
equivalent  formulas. 
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worlds,  in  proportion  to  the  latters’  prior  probabilities  P(w),  imaging  works  differently; 
each  excluded  world  w  transfers  its  mass  individually  to  a  select  set  of  worlds  Sx(w )  that 
are  considered  “closest”  to  w  among  those  satisfying  A"  =  x  (see  Fig.  1).  Joyce  (1999) 
used  the  “\”  symbol,  as  in  P(y\x),  to  denote  the  probability  resulting  from  such  imaging 
process. 


Figure  1:  Weight  shifting  in  Bayesian  (a)  and  imaging  (b)  conditionalizations. 

In  (Pearl,  2000,  p.  73)  I  have  shown  that  the  transformation  defined  by  the  do{x ) 
operator,  Eq.  (2),  can  be  interpreted  as  an  imaging-type  mass-transfer,  if  the  following  two 
provisions  are  met. 

Provision  1  -  the  choice  of  “similarity”  measure  is  not  arbitrary;  worlds  with  equal 
histories  should  be  considered  equally  similar  to  any  given  world. 

Provision  2  -  the  re-distribution  of  weight  within  each  selection  set  Sx(w )  is  not  arbitrary 
either,  equally-similar  worlds  should  receive  mass  in  proportion  to  their  prior  probabilities. 
This  tie-breaking  rule  is  similar  in  spirit  to  the  Bayesian  policy,4 

Regardless  of  how  we  define  “similarity” ,  the  Bayesian  tie-breaking  rule  (Provision  2) 
permits  us  to  write  a  general  expression  for  the  probability  function  P{w\x)  that  results 
from  imaging  on  x.  It  reads: 

P(w\x)  =  ^  P(w')P(w\Sx(w'))  (3) 

w' 

This  compact  formula,  adopted  from  Joyce  (2009),  is  applicable  to  any  selection  function 
Sx(w)  and  gives  the  final  weight  P[w\x )  of  both  excluded  and  preserved  worlds  w.5 

4Joyce  (2009)  labeled  this  mass  transfer  policy,  “ Bayesianized  imaging,”  and  noted  that  it  violates  a 
tacit  assumption  made  in  Gardenfors’s  proof  that  imaging  should  preserve  mixtures  (Gardenfors,  1988,  pp. 
108-113). 

5 This  follows  from  two  observations: 

Sx(w')  =  w'  if  w'  =>■  X  =  x 
P  (to  |  £3(11/))  =  0  if  tu  =>  A  ^  x 
since  any  such  w  is  not  a  member  in  any  Sx(wr)  set. 
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Accordingly,  for  any  two  propositions  A  and  B  we  can  write: 

P(B\A)  =  P(w')P(w\Sa(w') 

wEB  w' 

In  this  paper,  I  will  first  describe  how  the  post-interventional  probability  in  (2)  emerges 
from  the  imaging  probability  in  (3)  and  then  examine  a  wider  class  of  imaging  operations 
that  give  rise  to  Eq.  (2).  Using  provisions  1  and  2,  I  will  then  use  imaging  to  extend  the 
application  of  the  do(x)  operator  to  a  wider  set  of  suppositions,  beyond  those  defined  by 
the  structural  model.  Finally,  I  will  demonstrate  that  caution  need  be  exercised  when 
metaphysical  extensions  are  taken  literally,  without  careful  guidance  of  decision  making 
considerations. 


2  Action  as  Imaging 

To  see  how  Eq.  (2)  emerges  from  the  mass  transfer  policy  of  (3),  let  us  associate  a  “world” 
with  a  given  instantiation  of  the  three  sets  of  variable  {X,Y,Z}  where  X  stands  for  the 
action  variable  in  do( X  =  x ),  Y  stands  for  variables  that  are  potentially  affected  by  the 
action  (i.e.,  descendants  of  X  in  the  causal  graph),  and  Z  stands  for  all  other  variables  in 
the  model.  A  world  w  then  would  be  a  tuple  (x,  y,  z ). 

Prior  to  the  action,  each  world  is  assigned  the  mass  P(x,y,z).  After  the  action 
do{ X  =  x*)  is  executed,  this  mass  must  be  re-distributed,  since  worlds  in  which  A"  ^  x* 
must  be  ruled  out.  The  post-action  weight  of  w,  P(x,y,  z\do(X  =  x*)),  is  equal  to  the  old 
weight  plus  a  supplement  P{w'  — »  w )  that  w  receives  from  some  worlds  w'  whose  mass 
vanishes.  Aside  from  satisfying  X  ^  x*,  each  such  w'  must  consider  w  to  be  its  “most 
similar  neighbor,”  that  is,  w  must  be  a  member  of  Sx*(w'). 

According  to  provision-1  above,  worlds  in  the  most-similar  set  Sx*(w')  should  share  with 
w'  the  entire  past  ( Z  =  z)  up  to  the  point  where  the  action  occurs.6  This  means  that  the 
supplement  weight  P(w'  — »  w )  that  w  receives  in  the  transition  comes  from  each  and  every 
world  w'  =  (V,  y' ,  z')  that  shares  with  w  =  ( x,y,z )  its  z  component.  The  total  weight  in 
those  w'  worlds  is 

Y  P(w')=P(X^x\z).  (4) 

w'\ x'zfzx*  ,z' =z 

However,  w  does  not  receive  all  the  probabilities,  P{w')  that  w'  is  prepared  to  discharge, 
because  w  is  in  competition  with  other  ^-sharing  worlds  in  the  X  =  x*  subspace  (the  space 
of  surviving  worlds).  Since  weight  is  distributed  in  proportion  to  the  competitors  prior 
weight,  the  fraction  that  w  receives  from  each  w'  is  therefore: 

p(™)/  p(x*’  y> z' )  =  p(x*’  y>  z')/p(x *> z>)  =  p(y\x *>  z)-  (5) 

y 

6  This  follows  from  measuring  similarity  not  by  appearance,  but  rather  by  the  number  of  mechanism 
modifications  (or  “miracles”)  necessary  for  establishing  X  =  x*  (see  (Pearl,  2000,  p.  239)).  Clearly  worlds 
in  which  history  differs  from  ours  at  several  points  in  time  require  more  modifications  than  one  in  which 
history  remains  in  tact  and  only  the  last  mechanism  before  the  action  is  perturbed. 
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Multiplying  (4)  and  (5),  the  total  weight  transfered  to  w  from  all  its  w'  contributors  is 

P(y\x*,z)P(X  ±  x*,z)  =  P(y\x*,  z)P(z)(l  —  P{x*\z)) 

=  P{y\x *,  z)P(z)  -  P(x*,y,  z ). 

and  adding  to  this  w’s  original  weight,  P(x*,y,z),  we  finally  obtain: 

P(w\x )  =  P(y\x*,  z)P(z) 

=  P(y,x*,z)/P(x*\z)  (6) 

which  coincides  with  Eq.  (2). 

To  summarize,  we  have  established  the  identity 

P(w\x )  =  P(w\do(x))  (7) 

which  provides  an  imaging-grounded  justification  for  the  inverse-probability  weighting  that 
characterizes  the  do(x)  operator  and,  conversely,  a  decision-making  justification  for  the 
imaging  operator. 

3  Imaging  as  an  extrapolation  principle 

It  is  important  to  note  that  the  fraction  of  weight  that  w  receives  from  w',  P(y\x*,  z),  is  the 
same  for  any  weight-contributing  world  w'  that  sees  w  as  its  closest  neighbor,  for  they  all 
share  the  same  z.  This  means  that  the  same  result  would  obtain  had  we  not  insisted  that 
each  w'  individually  delivers  its  weight  to  its  closest  neighbors,  but  allow  instead  for  all  the 
^'-sharing  w's  to  first  pool  their  weights  together  and  then  deliver  that  pooled  weight  onto 
the  recipients  in  {w  :  x  —  x*,  z  —  zf}  in  proportion  to  their  prior  weights  P(x*,y,  z).  There 
is  in  fact  no  way  of  telling  which  way  weight  is  being  transferred  in  the  transition,  whether 
it  is  accomplished  through  an  individual  transfer,  as  in  Fig.  1(a)  or  through  “pooled” 
transfer,  as  in  Fig.  1(b). 

The  reason  for  this  ambiguity  lies  in  the  coarseness  of  the  propositions  X  =  x  to  which 
the  do(x)  operator  is  applicable.  In  structural  models,  such  propositions  are  limited  to 
elementary  instantiations  of  individual  variables,  and  to  conjunctions  of  such  instantiations, 
but  do  not  include  disjunctions  such  as  do( X  =  x  or  Y  =  y)  or  do( X  ^  x).  The  structural 
definition,  which  invokes  equation  removal,  insists  on  having  a  unique  solution  for  all 
variables,  before  and  after  the  intervention,  and  cannot  allow  therefore  for  ambiguity  in  the 
form  of  the  disjunction  do( X  =  x  or  X  =  x'). 

This  limitation  prevents  us  from  isolating  one  single  world  w'  =  {x' ,  y z')  and  watching 
how  its  weight  is  being  distributed  according  to  Eq.  (2).  To  do  so  would  require  us  to 
compute  the  distribution  P(x,  y,  z\do(^x',  ->y',  ~>z')),  which  is  not  definable  by  the  surgery 
operation  of  structural  models,  since  negations  cannot  be  expressed  as  conjunctions  of 
elementary  propositions  X  =  x ,  Y  =  y. 

If  we  take  imaging  as  an  organizing  principle,  more  fundamental  than  the  structural 
account,  we  can  easily  circumvent  this  limitation  and  use  Eq.  (3)  together  with  Provision  1 
to  compute  P{B\do{A))  for  any  arbitrary  propositions  A  and  B.  This  would  give: 

P(B\do(A))  =  27  E  P(™')P(w\Sa(w'))  (8) 

w\w£B  w' 
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Figure  2:  Imagining  using  individual  mass  transfer  (a)  and  pooled  mass  transfer  (b);  the  two 
are  indistinguishable  in  the  structural  account. 


where  Sa(w')  is  the  set  of  all  A-worlds  in  for  which  z  =  z’ . 

Proponents  of  metaphysical  principles  would  probably  welcome  the  opportunity  to 
overcome  various  limitations  of  the  do{x )  operator  and  extending  it  with  imaging-based 
extrapolations,  beyond  the  decision  making  context  for  which  it  was  motivated.  In  (Pearl, 
2000,  Chapter  7),  I  indeed  used  such  an  extension  to  interpret  counterfactuals  with 
non-manipulable  antecedents.  For  example,  to  define  statements  such  as  “She  would  have 
been  hired  had  she  not  been  a  female,”  in  which  it  is  difficult  to  imagine  a  physical  action 
do(female),  I  proposed  the  symbolic  removal  of  a  fictitious  equation  Gender  =  ug ;  the  result 
is  identical  of  course  to  (3).  Joyce  (2009)  has  also  noted  that  imaging  can  answer  problems 
on  which  the  do{x)  operator  is  silent,  and  his  example  (Berkson  paradox)  falls  well  within 
the  structural  definition  of  non-manipulative  counterfactuals  (see  (Pearl,  2000,  p.  206)). 

Such  extensions,  which  go  from  interventional  to  non-interventional  counterfactuals  are 
fairly  safe,  for  the  human  mind  interprets  the  two  types  of  sentences  through  the  same 
mental  machinery.  The  interpolation  proposed  in  (8)  however  is  of  different  character,  for 
it  assigns  a  concrete  formal  interpretation  to  disjunctive  action  do(A  or  B)  for  which  no 
structural  definition  exists.  In  the  next  section  I  will  argue  that  such  extensions  should  be 
approached  with  caution;  limitations  imposed  by  structural  models  are  there  for  a  reason  - 
the  keep  us  tuned  to  physical  reality  and  to  the  agents  operating  in  that  reality. 
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4  Imaging  and  Disjunctive  Actions 


Assume  we  are  given  three  variables,  X  =  {sq,  £2,  £3},  Y  =  {2/1, 2/2},  ^  =  {^1,^2}  such  that 
X  is  affected  by  Z  and  Y  is  affected  by  both  X  and  Z ,  as  shown  in  Fig.  3.  We  wish  to 


Figure  3:  Given  P(x,y,z ),  find  P(yi\do(x2  or  X3 )). 

compute  P(yi\do(x2  or  x3 ))  from  the  prior  probability  P(x,y,z),  but  since  do(x 2  or  x3)  is 
not  defined,  we  resort  to  P(yi\x2  or  X3)  instead,  as  given  in  (8). 

Following  the  derivation  in  Section  3  we  know  that,  in  every  Z  =  z  stratum,  each 
of  the  four  surviving  worlds  {(x,y,z)  :  x  G  {x2}X3 ),y  G  (2/1 , 2/2) }  receives  a  fraction 
P(x,  y,  z)/P{X  7^  Xi,  Z  —  z)  of  the  weight  released  by  the  two  excluded  worlds 

{(a/,  y',  z!)  :  x  =  xi,  y  G  {2/1, 2/2},  z!  =  z} 

The  final  mass  in  each  surviving  world  will  therefore  be: 

P(x,y,z\x 2  or  x3)  =  P(x,y,z)/P((x2  or  x3)\ z) 

=  P(x,y,z)/[P(x2\z)  +P(x3\z)\  (9) 

strongly  reminiscent  of  the  inverse  probability  formula  of  the  standard  do(x)  operator  (Eq. 
(2)),  and  amounts  to  Bayesian  conditioning  in  each  stratum  of  Z. 

To  compute  our  target  quantity,  we  sum  over  z  and  obtain: 

P(y i\x2  or  x3)  =  ^2  P{yi,z,x2  or  x3)/P(x 2  or  x3\z) 

Z 

=  Y,  p(z)\p(yi\z’x2)p(x,*\z)  +  p(m\z,x3)p(xs\z)]l[p{x2\z)  +  p(xi\z)\ 

(10) 

This  formula  can  be  given  a  simple  interpretation  in  terms  of  a  stochastic  intervention 
policy:  An  agent  instructed  to  perform  the  action  do(x 2  or  X3)  hrst  observes  the  value  of  Z, 
then  chooses  either  action  do(x 2)  or  do(x 3)  with  probability  ratio  P(x2|z)  :  P{xz\z). 

We  see  that  the  interpretation  engendered  by  imaging  reflects  a  commitment  to  specific 
interventional  policy  that  may  or  may  not  be  compatible  with  the  intent  of  the  action 
do{x2  or  X3). 

In  the  next  section  we  will  see  that  the  silence  of  the  structural  theory  vis  a  vis 
disjunctive  actions  is  not  a  sign  of  weakness  but  rather  a  wise  warning  to  potential 
ambiguity  that  deserves  the  attention  of  rational  agents. 
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5  Restaurants  and  Taxi  Drivers  in  the  service  of 
Imaging 

Consider  the  sentence: 

“The  food  was  terrible,  we  should  have  asked  the  taxi  driver  to  drop  us  at  any 
of  the  other  two  restaurants  in  town.” 

Let  the  proposition  X  =  xi:  i  =  1,2,3  stand  for  “eating  at  the  ith  restaurant”,  and  let 
Y  =  y1  stand  for  “fine  food.”  Assume  that  the  quality  of  the  various  restaurants  is  encoded 
by  the  conditional  probability  P(y\x),  with  x  G  (xi,X2,x3),y  G  (2/1, 2/2)-  We  ask  whether 
the  imaging  interpretation  of  P{y\do{x2  or  x3))  (Eqs.  (3),  (7),  or  (10))  would  provide  an 
adequate  evaluation  of  the  sentence  above. 

The  Erst  question  to  ask  is  whether  the  information  available  is  sufficient  for  calculating 
the  probability  of  being  dropped  off  at  restaurant  x2  (similarly,  x3)  were  we  to  instruct 
the  driver  to  avoid  restaurant  X\.  The  structural  theory  says:  no,  and  the  imaging  theory 
says:  yes.  The  former  argues  that  the  answer  we  seek  is  highly  sensitive  to  the  process  that 
determines  X,  the  choice  of  restaurant,  and  that  there  is  nothing  in  the  information  at 
hand  that  dictates  how  a  taxi  driver  would  behave  once  her  space  of  options  shrinks  from 
three  to  two  alternatives.  The  imaging  theorist  argues  that  it  is  highly  unlikely  (though 
possible)  that  a  driver  who  prefers  x2  to  x3  when  three  options  are  available  would  change 
her  preference  under  two  options.  Therefore,  in  the  absence  of  information  to  the  contrary, 
we  should  appeal  to  Eq.  (3)  and,  since  all  worlds  are  equally  similar  (i.e.,  sharing  the  same 
past,  Z  =  {0},)  (9)  leads  us  to  the  Bayesian  solution: 

P(x2\do(x2  or  x3))  =  P(x2 \x2  or  x3)  =  P(x2)/[P (x2)  +  P(x3)]  (11) 

which  preserves  not  only  preferences  but  ratios  as  well.' 

The  structural  theorist  is  not  happy  with  this  solution,  and  claims  that,  even  if  we 
assume  “ratio  preservation”  under  shrinking  options,  that  does  not  guarantee  proper 
evaluation  of  the  hypothetical  P(y\do(x2  or  x3))  because,  even  if  ratios  are  preserved  by 
every  individual  taxi  driver,  they  may  not  be  preserved  in  probability.  To  back  up  this 
claim  he  presents  a  numerical  example,  showing  that,  two  different  assumptions  about 
drivers’  behavior,  both  consistent  with  the  information  available,  produce  drastically 
different  values  for  P(x2\do(x2  or  x3)). 

Suppose  there  are  two  types  of  taxi  drivers  in  town.  Type  1,  designated  Z  =  z±,  drop 
all  customers  at  restaurant  x3  i.e., 

P(x\zi)  —  J1  ilx  =  x 3  (12) 

)  0  otherwise 

7Note  that  both  theories  agree  that,  in  principle,  Bayesian  conditionalization  is  inadequate  for  evaluating 
the  probability  sought  in  our  story.  The  reason  being  that  Bayesian  conditionalization  represents  indicative 
conditionals  (e.g.,  knowing  that  we  were  not  dropped  of  at  restaurant  X\,  how  likely  is  it  that  we  will  end 
up  eating  at  X2)  while  our  conditional  is  subjunctive  (e.g.,  if  we  were  to  avoid  X\)  or  interventional  (e.g., 
if  we  forbade  the  driver  from  x{).  The  imaging  analyst,  however  is  willing  to  compromise,  arguing  that,  if 
Bayesianized  imaging  works  for  definitive  action,  it  should  work  for  disjunctive  action  as  well. 


Type  2  drivers,  designated  Z  =  z2,  are  not  on  the  payroll  of  restaurant  x3,  and  follow  the 
following  pattern: 


P(x\z2) 


8/9  if  x  —  xi 
1/9  if  x  =  x2 
0  otherwise 


(13) 


10%  of  taxi  drivers  are  of  type  1  and  90%  of  type  2,  thus  P(zi)  =  0.1  =  1  —  P(z2), 
Accordingly,  the  prior  probabilities  for  X  calculate  to: 

P(x)  =  ^P(x\z)P(z) 

Z 

{0.80  for  x  —  x\ 

0.10  for  x  =  x2  (14) 

0.10  for  x  =  x3 

and  the  Bayesian  answer  gives  equal  probabilities  to  restaurants  X2  and  x3: 

P(x 2\x2  or  x3)  =  0.50 

P(x3 \x2  or  x3)  =  0.50  (15) 

On  the  other  hand,  if  imaging  is  applied,  Eq.  (10)  gives  a  9  to  1  preference  to  restaurant 
x2: 


P(x2\do(yx2  or  X3)  =  P(z)P(x2\z)/P(x2  or  x3\z) 

Z 

=  P(z)[P(x2\z)/[P(x2\z)  +  P(x3\z)]  (16) 

Z 

=  0.90 

while 

P{x3\do{x2  or  x3)  =  0.10 

This  proves,  claims  the  structural  analyst,  that  contrary  to  Eq.  (11),  P(x)  tells  us 
nothing  about  which  restaurant  we  are  likely  to  end  up  in  if  we  allow  each  driver  to 
follow  her  own  preferences  and  average  over  all  drivers.  What  started  with  as  equal  prior 
probabilities  (Eqs.  14-15)  turns  out  to  be  a  9:1  preference,  just  by  explicating  the  behavior 
of  each  driver.  (Reversal  of  probability  ratios  is  of  course  easy  to  demonstrate  by  assuming, 
for  example  P(z\)  =  0.20).  This  drastic  change  will  be  reflected  in  the  expected  food 
quality  P(yi\do(x2  or  x3).  Even  if  we  assume  that  Y  is  independent  of  Z  conditional  on  X, 
the  fact  that  P(x2\do(x2  or  x3))  depends  so  critically  on  the  distribution  of  driver  types, 
amounts  to  saying  that  the  information  available  is  insufficient  for  calculating  the  target 
quantity.  Yet  imaging  commits  to  the  Bayesian  answer  (11)  if  we  are  not  given  P(x\z). 
Such  sensitivity  does  not  occur  in  the  calculation  of  non-disjunctive  interventions  like 
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P{yi\do{x2))\  regardless  of  the  story  about  taxi  drivers  and  their  preferences,  as  long  as  we 
average  over  types,  we  get  the  same  answer 

P(yi\do(x2))  =  P(yi\x2). 

The  imaging  analyst  replies  that,  while  he  appreciates  the  warning  that  the  structural 
theory  gives  to  decision  makers,  imaging  is  an  epistemic  theory  and,  as  such,  it  views 
sensitivity  to  mechanisms  as  a  virtue,  not  a  weakness.  If  an  agent  truly  believes  the  story 
about  the  two  types  of  taxi  drivers,  it  is  only  rational  that  the  agent  also  believes  in  the  9:1 
probability  ratio.  If,  on  the  other  hand,  the  agent  has  no  basis  for  supposing  this  story  over 
other  conspiratorial  theories,  the  Bayesian  answer  is  the  best  one  can  expect. 

There  is  nothing  new  in  sensitivity  to  processes,  argues  the  imaging  analyst,  it  is 
commonplace  even  in  the  structural  context.  We  know,  for  example,  that  the  probability 
of  causation  (i.e.,  the  probability  that  Y  would  be  different  had  X  been  different  (Pearl, 
2000))  is  sensitive  to  the  mechanism  underlying  the  data.  Yet  the  structural  theory  does 
not  proclaim  this  probability  “undefined.”  On  the  contrary,  it  considers  it  well-defined 
in  fully  specified  models,  and  “unidentified”  in  a  partially  specified  models,  where  some 
aspects  of  the  underlying  mechanisms  are  not  known.  In  contrast,  counterfactuals  with 
disjunctive  antecedents  are  deemed  “undefined”  even  in  fully  specified  structural  models. 

Here,  the  structural  analyst  replies  that  disjunctive  counterfactuals  are  defined,  albeit 
in  the  form  of  an  interval,  with 

P{y\do(x 2  or  x3))  e  [P(y\do(x2)),P(y\do(x3))] 

and,  if  one  insists  on  obtaining  a  definitive  value  for  P(y\do(x2  or  x3)),  a  fully-specified 
model  should  take  into  account  how  each  agent  reacts  to  shrinking  options  -  the  Bayesian 
assumption  that  probability  ratios  should  be  preserved  by  default,  at  the  population  level, 
is  utterly  ad  hoc. 

The  conversation  would  probably  not  end  here,  but  the  paper  must. 


6  Conclusions 

The  structural  account  of  actions  and  counterfactuals  provides  a  decision  theoretic 
justification  for  two  provisions  associated  with  imaging  operations:  (i)  worlds  with 
equal  histories  should  be  deemed  equally  similar,  and  (ii)  ties  are  broken  in  a  Bayesian 
fashion.  Extending  these  provisions  beyond  the  context  of  decision  making  led  to  plausible 
interpretation  of  non-manipulative  counterfactuals  using  either  the  structural  or  the 
possible- worlds  accounts.  However,  extensions  to  disjunctive  actions  were  shown  to  require 
assumptions  that  one  may  not  be  prepared  to  make  in  any  given  situation.  This  paper 
explicates  some  of  these  assumptions  and  helps  clarify  the  relationships  between  the 
structural  and  imaging  accounts  of  counterfactuals. 
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